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Abstract 



We study the problem of reconstructing an unknown matrix M of rank r and di- 
mension d using 0(r(i poly log (i) Pauli measurements. This has applications in 
quantum state tomography, and is a non-commutative analogue of a well-known 
problem in compressed sensing: recovering a sparse vector from a few of its 
Fourier coefficients. 

We show that almost all sets of 0{rd log^ d) Pauli measurements satisfy the rank- 
r restricted isometry property (RIP). This implies that M can be recovered from 
a fixed ("universal") set of PauU measurements, using nuclear-norm minimization 
(e.g., the matrix Lasso), with nearly-optimal bounds on the error A similar result 
holds for any class of measurements that use an orthonormal operator basis whose 
elements have small operator norm. Our proof uses Dudley's inequality for Gaus- 
sian processes, together with bounds on covering numbers obtained via entropy 
duality. 



1 Introduction 

Low-rank matrix recovery is the following problem: let M be some unknown matrix of dimension 
d and rank r <^ d, and let Ai, A2, . . . , Am be a set of measurement matrices; then can one recon- 
struct M from its inner products tr(M*yli), tr(A/* A2), . . . , tr(Af*yl„,)? This problem has many 
applications in machine learning |[Tl|2l, e.g., collaborative filtering (the Netflix problem). Remark- 
ably, it turns out that for many useful choices of measurement matrices, low-rank matrix recovery 
is possible, and can even be done efficiently. For example, when the Ai are Gaussian random ma- 
trices, then it is known that m = 0{rd) measurements are sufficient to uniquely determine AI, and 
furthermore, M can be reconstructed by solving a convex program (minimizing the nuclear norm) 
lEimiSl- Another example is the "matrix completion" problem, where the measurements return a 
random subset of matrix elements of M; in this case, m = 0(r(i poly log d) measurements suffice, 
provided that M satisfies some "incoherence" conditions IISllTl l8l l9l [TOl . 

The focus of this paper is on a different class of measurements, known as Pauli measurements. Here, 
the Ai are randomly chosen elements of the Pauli basis, a particular orthonormal basis of C''^''. The 
Pauli basis is a non-commutative analogue of the Fourier basis in C*; thus, low -rank matrix recovery 
using Pauli measurements can be viewed as a generalization of the idea of compressed sensing of 
sparse vectors using their Fourier coefficients [11. 12 1. In addition, this problem has applications 
in quantum state tomography, the task of learning an unknown quantum state by performing mea- 
surements 1 13 1. This is because most quantum states of physical interest are accurately described by 
density matrices that have low rank; and Pauli measurements are especially easy to carry out in an 
experiment (due to the tensor product structure of the Pauli basis). 
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In this paper we show stronger resuhs on low-rank matrix recovery from Pauli measurements. Pre- 
viously IIT3I ISl. it was known that, for every rank-r matrix M e C'^^'', almost all choices of 
TO = 0(r(i poly log d) random Pauli measurements will lead to successful recovery of AI. Here 
we show a stronger statement: there is a fixed ("universal") set of to = 0{rdpo\y log d) Pauli mea- 
surements, such that for all rank-r matrices M S C'^^''', we have successful recovery^ We do this 
by showing that the random Pauli sampling operator obeys the "restricted isometry property" (RIP). 
Intuitively, RIP says that the sampling operator is an approximate isometry, acting on the set of all 
low-rank matrices. In geometric terms, it says that the sampling operator embeds the manifold of 
low-rank matrices into 0{rd poly log d) dimensions, with low distortion in the 2-norm. 

RIP for low -rank matrices is a very strong property, and prior to this work, it was only known to hold 
for very unstructured types of random measurements, such as Gaussian measurements |3|, which 
are unsuitable for most applications. RIP was known to fail in the matrix completion case, and 
whether it held for Pauli measurements was an open question. Once we have established RIP for 
Pauli measurements, we can use known results |3, 4, 5 | to show low -rank matrix recovery from a 
universal set of Pauli measurements. In particular, using [5|, we can get nearly-optimal universal 
bounds on the error of the reconstructed density matrix, when the data are noisy; and we can even get 
bounds on the recovery of arbitrary (not necessarily low-rank) matrices. These RIP -based bounds are 
qualitatively stronger than those obtained using "dual certificates" [ 14J (though the latter technique 
is applicable in some situations where RIP fails). 

In the context of quantum state tomography, this implies that, given a quantum state that consists 
of a low-rank component A/,, plus a residual full-rank component Mc, we can reconstruct Mr up 
to an error that is not much larger than Mc- In particular, let || • || * denote the nuclear norm, and let 
\\-\\f denote the Frobenius norm. Then the error can be bounded in the nuclear norm by O ( 1 1 Mc 1 1 * ) 
(assuming noiseless data), and it can be bounded in the Frobenius norm by 0(||A/c||i? poly log d) 
(which holds even with noisy datiB- This shows that our reconstruction is nearly as good as the 
best rank-r approximation to AI (which is given by the truncated SVD). In addition, a completely 
arbitrary quantum state can be reconstructed up to an error of 0(1/ y^) in Frobenius norm. Lastly, 
the RIP gives some insight into the optimal design of tomography experiments, in particular, the 
tradeoff between the number of measurement settings (which is essentially to), and the number of 
repetitions of the experiment at each setting (which determines the statistical noise that enters the 
data) L15J. 

These results can be generahzed beyond the class of Pauli measurements. Essentially, one can 
replace the Pauli basis with any orthonormal basis of C''^'' that is incoherent, i.e., whose elements 
have small operator norm (of order 0(1/ Vd), say); a similar generalization was noted in the earlier 
results of |8 1. Also, our proof shows that the RIP actually holds in a slightly stronger sense: it holds 
not just for all rank-r matrices, but for all matrices X that satisfy < -yr||X||i?. 

To prove this result, we combine a number of techniques that have appeared elsewhere. RIP results 
were previously known for Gaussian measurements and some of their close relatives [3J. Also, 
restricted strong convexity (RSC), a similar but somewhat weaker property, was recently shown 
in the context of the matrix completion problem (with additional "non-spikiness" conditions) fTOl. 
These results follow from covering arguments (i.e., using a concentration inequality to upper-bound 
the failure probability on each individual low-rank matrix X, and then taking the union bound over 
all such X). Showing RIP for Pauli measurements seems to be more delicate, however. Pauli 
measurements have more structure and less randomness, so the concentration of measure phenomena 
are weaker, and the union bound no longer gives the desired result. 

Instead, one must take into account the favorable correlations between the behavior of the sampling 
operator on different matrices — intuitively, if two low-rank matrices AI and AI' have overlapping 
supports, then good behavior on AI is positively correlated with good behavior on AI' . This can be 
done by transforming the problem into a Gaussian process, and using Dudley's entropy bound. This 
is the same approach used in classical compressed sensing, to show RIP for Fourier measurements 
lfT2l[lT :l . The key difference is that in our case, the Gaussian process is indexed by low -rank matrices, 
rather than sparse vectors. To bound the correlations in this process, one then needs to bound the 
covering numbers of the nuclear norm ball (of matrices), rather than the li ball (of vectors). This 



'Note that in the universal result, m is slightly larger, by a factor of poly log d. 
^However, this bound is not universal. 
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requires a different technique, using entropy duaHty, which is due to Guedon et al [16|. (See also 
the related work in ifTTI .) 

As a side note, we remark that matrix recovery can sometimes fail because there exist large sets of 
up to d Pauli matrices that all commute, i.e., they have a simultaneous eigenbasis (pi, . . . , (pd- (These 
(pi are of interest in quantum information — they are called stabilizer states |18 |.) If one were to 
measure such a set of Pauli's, one would gain complete knowledge about the diagonal elements of 
the unknown matrix AI in the (pi basis, but one would learn nothing about the off-diagonal elements. 
This is reminiscent of the difficulties that arise in matrix completion. However, in our case, these 
pathological cases turn out to be rare, since it is unlikely that a random subset of Pauli matrices will 
all commute. 

Finally, we note that there is a large body of related work on estimating a low -rank matrix by solving 
a regularized convex program; see, e.g., |1T9] 1201 . 

This paper is organized as follows. In section 2, we state our results precisely, and discuss some 
specific applications to quantum state tomography. In section 3 we prove the RIP for Pauli matrices, 
and in section 4 we discuss some directions for future work. Some technical details appear in 
sections |A] and IB] 

Notation: For vectors, |j-||2 denotes the £2 norm. For matrices, denotes the Schatten p-norm, 
\\X\\p = {'^iO'i{Xyy/P, where (Ti{X) are the singular values of X. In particular, ||-||* = ||-||i 
is the trace or nuclear norm, = ||-||2 is the Frobenius norm, and ||-|| = ||-||oo is the operator 
norm. Finally, for matrices. A* is the adjoint of A, and (•, •) is the Hilbert-Schmidt inner product, 
{A, B) = ir{A* B). Calligraphic letters denote superoperators acting on matrices. Also, | A) (^| is 
the superoperator that maps every matrix X G C''^'' to the matrix A ti:{A*X). 

2 Our Results 

We will consider the following approach to low-rank matrix recovery. Let M S C'^^'' be an un- 
known matrix of rank at most r. Let Wi, . . . , Wd^ be an orthonormal basis for C^^"^, with respect 
to the inner product {A, B) = tr{A*B). We choose m basis elements, 5i, . . . , Sm, iid uniformly 
at random from {Wi, . . . , W^p} ("sampling with replacement"). We then observe the coefficients 
{Si, AI). From this data, we want to reconstruct M. 

For this to be possible, the measurement matrices Wi must be "incoherent" with respect to M. 
Roughly speaking, this means that the inner products {Wi, AI) must be small. Formally, we say that 
the basis Wi , ■ ■ ■ , Wd? is incoherent if the Wi all have small operator norm, 

\\W.^\ < K/Vd, (1) 
where if is a constant^ (This assumption was also used in jSj.) 

Before proceeding further, let us sketch the connection between this problem and quantum state 
tomography. Consider a system of n qubits, with Hilbert space dimension d = 2". We want to learn 
the state of the system, which is described by a density matrix p E C^^"*; p is positive semidefinite, 
has trace 1, and has rank r ^ d when the state is nearly pure. There is a class of convenient (and 
experimentally feasible) measurements, which are described by Pauli matrices (also called Pauli 
observables). These are matrices of the form Pi (E) ■ ■ ■ <E) Pn, where denotes the tensor product 
(Kronecker product), and each is a 2 x 2 matrix chosen from the following four possibilities: 

/=(j 'T.=(j J), ^,=(- v), ^.=(j \y (2) 

One can estimate expectation values of Pauli observables, which are given by {p, {Pi (E) ■ ■ ■ <E) Pn))- 
This is a special case of the above measurement model, where the measurement matrices Wi are 
the (scaled) Pauli observables (Pi (g) • • • (X) Pn)/Vd, and they are incoherent with ||Wi|| < K/^/d, 
K = l. 



^Note that || W^i || is the maximum inner product between Wi and any rank-l matrix M (normalized so that 

\M\\f = 1). 
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Now we return to our discussion of the general problem. We choose 5*1, ... , Sm iid uniformly at 
random from {Wi , . . . , Wd^ }, and we define the sampling operator A: C'^^'' — > C™ as 

{A{X)),^^iv[S*X), z = l,...,m. (3) 

The normalization is chosen so that ¥.A*A = I. (Note that A* A = J2jLi\Sj) {Sj\ ■ 

We assume we are given the data y = A{AI) + z, where z £ C™ is some (unknown) noise contribu- 
tion. We will construct an estimator M by minimizing the nuclear norm, subject to the constraints 
specified by y. (Note that one can view the nuclear norm as a convex relaxation of the rank function 
— thus these estimators can be computed efficiently.) One approach is the matrix Dantzig selector: 

Af = argniin||X||* such that < A. (4) 

Alternatively, one can solve a regularized least-squares problem, also called the matrix Lasso: 

M = argminip(X) - y\\l + ^i\\X\\,. (5) 

Here, the parameters A and fj, are set according to the strength of the noise component z (we will 
discuss this later). We will be interested in bounding the error of these estimators. To do this, we 
will show that the sampling operator A satisfies the restricted isometry property (RIP). 

2.1 RIP for Pauli Measurements 

Fix some constant < 5 < 1. Fix d, and some set U C C'^^'*. We say that A satisfies the restricted 
isometry property (RIP) over U if, for all X E U,we have 

(1 - S)\\X\\f < \\A{X)h < (1 + 5)\\X\\f. (6) 

(Here, ||yl(X)||2 denotes the £2 norm of a vector, while ||^||_f denotes the Frobenius norm of a 
matrix.) When U is the set of all X G C'^^'' with rank r, this is precisely the notion of RIP studied 
in ||3]|5j. We will show that Pauli measurements satisfy the RIP over a slightly larger set (the set of 
all X G (!2dxd g^^jj jjj^j \\X\\^ < y/r\\X\\F), provided the number of measurements m is at least 
n{rd poly log d). This result generalizes to measurements in any basis with small operator norm. 

Theorem 2.1 Fix some constant < 5 < 1. Let {Wi, . . . , Wj,'^ } be an orthonormal basis for C'^'^'^ 
that is incoherent in the sense o/dT). Let m = CK^ ■ rdlog d, for some constant C that depends 
only on 5, C — 0(l/(5^). Let A be defined as in (O. Then, with high probability (over the choice 
ofSi,..., Sm), A satisfies the RIP over the set of all X e C''^*'^ such that \\X\\^ < ^\\X\\f- 
Furthermore, the failure probability is exponentially small in 6^C. 

We will prove this theorem in section 3. In the remainder of this section, we discuss its applications 
to low-rank matrix recovery, and quantum state tomography in particular 

2.2 Applications 

By combining Theorem 12. II with previous results ||3] 2] |5], we immediately obtain bounds on the 
accuracy of the matrix Dantzig selector (|4) and the matrix Lasso (|5]). In particular, for the first time 
we can show universal recovery of low-rank matrices via Pauli measurements, and near-optimal 
bounds on the accuracy of the reconstruction when the data is noisy 15|. (Similar results hold for 
measurements in any incoherent operator basis.) These RIP-based results improve on the earlier 
results based on dual certificates [|13I I8|II41. See ||3]|4]|5] for details. 

Here, we will sketch a couple of these results that are of particular interest for quantum state to- 
mography. Here, M is the density matrix describing the state of a quantum mechanical object, and 
A{M) is a vector of Pauli expectation values for the state M. (M has some additional properties: 
it is positive semidefinite, and has trace 1; thus A{M) is a real vector) There are two main issues 
that arise. First, M is not precisely low-rank. In many situations, the ideal state has low rank (for 
instance, a pure state has rank 1); however, for the actual state observed in an experiment, the den- 
sity matrix M is full-rank with decaying eigenvalues. Typically, we will be interested in obtaining a 
good low-rank approximation to M, ignoring the tail of the spectrum. 
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Secondly, the measurements of A{M) are inherently noisy. We do not observe A{M) directly; 
rather, we estimate each entry {A{M))i by preparing many copies of the state M, measuring the 
Pauli observable Si on each copy, and averaging the results. Thus, we observe yi = {A{M))i + Zi, 
where Zi is binomially distributed. When the number of experiments being averaged is large, zi can 
be approximated by Gaussian noise. We will be interested in getting an estimate of M that is stable 
with respect to this noise. (We remark that one can also reduce the statistical noise by performing 
more repetitions of each experiment. This suggests the possibility of a tradeoff between the accuracy 
of estimating each parameter, and the number of parameters one chooses to measure overall. This 
will be discussed elsewhere 1 15 1.) 

We would like to reconstruct M up to a small error in the nuclear or Frobenius norm. Let M be 
our estimate. Bounding the error in nuclear norm implies that, for any measurement allowed by 
quantum mechanics, the probability of distinguishing the state M from M is small. Bounding the 
error in Frobenius norm implies that the difference M — M is highly "mixed" (and thus does not 
contribute to the coherent or "quantum" behavior of the system). 

We now sketch a few results from [4, 5 | that apply to this situation. Write M = Mr + Mc, where 
Mr is a rank-r approximation to M, corresponding to the r largest singular values of M, and Mc 
is the residual part of M (the "tail" of AI). Ideally, our goal is to estimate M up to an error that is 
not much larger than Mc- First, we can bound the error in nuclear norm (assuming the data has no 
noise): 

Proposition 2.2 (Theorems from (4^) Let A : C''^'' -> C™ be the random Pauli sampling operator, 
with m = Crc? log^ d, for some absolute constant C. Then, with high probability over the choice of 
A the following holds: 

Let M be any matrix in C'^^'^, and write M — Mr + Mc, as described above. Say we observe 
y = A{M), with no noise. Let M be the Dantzig selector d?} with A = 0. Then 

||M-M||, < C^||M,||,, (7) 

where Cq is an absolute constant. 

We can also bound the error in Frobenius norm, allowing for noisy data: 

Proposition 2.3 (Lemma 3.2 from l^) Assume the same set-up as above, but say we observe y = 
Ai^M) + z, where z ~ A'^(0, a^I). Let M be the Dantzig selector d?]) with X = 8y/da, or the Lasso 
with n — 16Vda. Then, with high probability over the noise z, 

||Af -M||f < CoV^a + Ci\\Mc\\,/V^, (8) 
where Cq and Ci are absolute constants. 

This bounds the error of M in terms of the noise strength a and the size of the tail Mc- It is universal: 
one sampling operator A works for all matrices M. While this bound may seem unnatural because 
it mixes different norms, it can be quite useful. When AI actually is low-rank (with rank r), then 
Mc — 0, and the bound ^ becomes particularly simple. The dependence on the noise strength a 
is known to be nearly minimax-optimal ||5]. Furthermore, when some of the singular values of M 
fall below the "noise level" \/da, one can show a tighter bound, with a nearly-optimal bias-variance 
tradeoff; see Theorem 2.7 in [5 1 for details. 

On the other hand, when M is full-rank, then the error of M depends on the behavior of the tail Mc- 
We will consider a couple of cases. First, suppose we do not assume anything about M, besides the 
fact that it is a density matrix for a quantum state. Then || — 1, hence || Afc||* < 1 ^ ^, and we 
canuse (O to get ||Af — Af ||f < CoVrda + Thus, even for arfo/frary (not necessarily low-rank) 

quantum states, the estimator M gives nontrivial results. The 0(1 /y/r) term can be interpreted as 
the penalty for only measuring an incomplete subset of the Pauli observables. 

Finally, consider the case where M is full-rank, but we do know that the tail Mc is small. If we 
know that Mc is small in nuclear norm, then we can use equation (|8). However, if we know that Mc 
is small in Frobenius norm, one can give a different bound, using ideas from ||5j, as follows. 
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Proposition 2.4 Let M be any matrix in C''^'', with singular values ai{M) > ■ ■ ■ > ad{M). 

Choose a random Pauli sampling operator A : C^'^'^ — > C™, with m — Crdlog^ d, for some 
absolute constant C. Say we observe y = A{M) + z, where z ^ N{0, a^I). Let M be the Dantzig 
selector d?} with A = IGVda, or the Lasso (|5} with fi — 32\/da. Then, with high probability over 
the choice of A and the noise z, 

r d 

||M-Af|||<Co^min(f72(M),da2) + C2(log^rf) ^ <yf{M), (9) 

i—l i—r+l 

where Cq and C2 are absolute constants. 

This bound can be interpreted as follows. The first term expresses the bias-variance tradeoff for esti- 
mating Mr, while the second term depends on the Frobenius norm of Mc- (Note that the log® d factor 
may not be tight.) In particular, this implies: ||A/ — M\\f < \fC^\frda + v^C2^(log'^ A/c||_f. 
This can be compared with equation (|8]l (involving ||A//c||*). This bound will be better when 
II-^cIIf ll-^cll*, i-c, when the tail Mc has slowly-decaying eigenvalues (in physical terms, it 
is highly mixed). 

Proposition 12.41 is an adaptation of Theorem 2.8 in f5l. We sketch the proof in section |B] Note 
that this bound is not universal: it shows that for all matrices M, a random choice of the sampling 
operator A is Ukely to work. 



3 Proof of the RIP for Pauli Measurements 



We now prove Theorem 12.11 The general approach involving Dudley's entropy bound is similar to 
{121, while the technical part of the proof (bounding certain covering numbers) uses ideas from I1I6J . 
We summarize the argument here; the details are given in section |A] 

3.1 Overview 

Letf/2 = {X e C'*^'' I \\X\\f < 1, \\X\\^ < y^\\X\\F}. Let S be the set of all self-adjoint linear 
operators from C^^"* to C'*^'*, and define the following norm on B: 

\\M\\^r) = sup \{X,MX)\. (10) 

xeU2 

(Suppose r > 2, which is sufficient for our purposes. It is straightforward to show that is a 

norm, and that S is a Banach space with respect to this norm.) Then let us define 

Sr{A) = \\A*A-I\\^r)- (11) 

By an elementary argument, in order to prove RIP, it suffices to show that er{A) < 26 — S^. We 
will proceed as follows: we will first bound Ker{A), then show that er{A) is concentrated around 
its mean. 

Using a standard symmetrization argument, we have that Ker{A) < 2E J2T=i i^j I ~ ' 

where the ej are Rademacher (iid ±1) random variables. Here the round ket notation \Sj) means 
we view the matrix Sj as an element of the vector space C*^ with Hilbert-Schmidt inner product; 
the round bra (^Sj | denotes the adjoint element in the (dual) vector space. 

Now we use the following lemma, which we will prove later This bounds the expected magnitude 
in (r)-norm of a Rademacher sum of a fixed collection of operators Vi, . . . , Vm that have small 
operator norm. 

Lemma 3.1 Let m < d^. Fix some Vi, . . . , Vm G C'*^'' that have uniformly bounded operator 
norm, \\Vi\\ < K (for all i). Let ei^ ... ,£m be iid uniform^! random variables. Then 

1/2 

(12) 

(r) 



i=l i=l 

where C5 = ^/r • C4^K log^^^ dlog^^^ m and C4 is some universal constant. 
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After some algebra, one gets that Eer{A) < 2{Eer{A) + 1)^^^ ' C's • y where C5 ^ ^/r ■ 

C4K \og^ d. By finding the roots of this quadratic equation, we get the following bound on Eer{A). 
Let A > 1. Assume that m > Xd{2C5)^ = X- ACj ■ dr ■ K"^ log^ d. Then we have the desired result: 

EeriA) < X + 7i- (13) 

It remains to show that er{A) is concentrated around its expectation. For this we use a concentration 
inequality from [22 1 for sums of independent symmetric random variables that take values in some 
Banach space. See section lAl for details. 

3.2 Proof of Lemma UTT] (bounding a Rademacher sum in (r)-norm) 

Let Lq = lEell^™ Eil Vi) {Vi\\\ (r); this is the quantity we want to bound. Using a standard com- 
parison principle, we can replace the ±1 random variables Si with iid A^(0, 1) Gaussian random 
variables gi; then we get 

ni 

Lo<Eg sup J^\G{X)\, G{X) = J29^\iV^,X)\\ (14) 



xeu, 



The random variables G{X) (indexed by X e U2) form a Gaussian process, and Lq is upper- 
bounded by the expected supremum of this process. Using the fact that G(0) = and G(-) is 
symmetric, and Dudley's inequality (Theorem 11.17 in [22]), we have 

/>oo 

Lq < V2^Eg sup G{X) < 24^2^ / \og^^^ N{U2,dG,e)de, (15) 
XGU2 Jo 

where N{U2,dQ, e) is a covering number (the number of balls in C'*^'' of radius e in the metric d^ 
that are needed to cover the set U2), and the metric do is given by 

daiX, Y) - (e[{G{X) - G(y))2]) (16) 

Define a new norm (actually a semi-norm) \\-\\x on C''^'^, as follows: 

||M||x= max \(V^,M)\. (17) 

We use this to upper-bound the metric do- An elementary calculation shows that dG{X,Y) < 
2R\\X — Y\\x, where R — This lets us upper-bound the covering numbers in 

da with covering numbers in || -Hx: 

N{U2,dG,e) < N{U2, \\-\\x, ^) = N{^U2, \\-\\x, 51^). (18) 

We will now bound these covering numbers. First, we introduce some notation: let || -jlp denote the 
Schatten p-norm on C^^"^, and let Bp be the unit ball in this norm. Also, let Bx be the unit ball in 
the norm. 

Observe that -^C^2 ^ Bi C K ■ Bx- (The second inclusion follows because ||M||x < 
maxi=i ||M||* < _ftr||M||*.) This gives a simple bound on the covering numbers: 

Ni^U2,\\-\\x,e) < NiB,,\\-\\x,e) < N{K ■ Bx,\\-\\x,s). (19) 
This is 1 when e > K. So, in Dudley's inequality, we can restrict the integral to the interval [0, K]. 
When e is small, we will use the following simple bound (equation (5.7) in 1231): 

NiK-Bx,Hx,e)<il + ^f''\ (20) 
When e is large, we will use a more sophisticated bound based on Maurey's empirical method and 
entropy duality, which is due to |fT6l (see also IfTTl ): 

iV(i?i, II 'llx, e) < exp(^^i^ log'^ dlog m), for some constant Ci . (21) 
We defer the proof of (ISTT i to the next section. 

Using ( |20b and ( l2TT i. we can bound the integral in Dudley's inequality. We get 

Lq < GaRVtK d log^/^ m, (22) 

where C4 is some universal constant. This proves the lemma. 
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3.3 Proof of Equation (T\\ (covering numbers of the nuclear-norm ball) 

Our result will follow easily from a bound on covering numbers introduced in |fT6l (where it appears 
as Lemma 1): 

Lemma 3.2 Let E be a Banach space, having modulus of convexity of power type 2 with constant 
X{E). Let E* be the dual space, and let T2{E*) denote its type 2 constant. Let Be denote the unit 
ball in E. 

Let Vi, . . . , Vjn S E*, such that || V} || < K (for all j). Define the norm on E, 

\\M\\x= max \{V,,M)\. (23) 

Then, for any e > 0, 

e\og'/^N{BE,\\-\\x,e) < C2X{EfT2{E*)K\og'^^m, (24) 
where C2 is some universal constant. 

The proof uses entropy duality to reduce the problem to bounding the "dual" covering number The 
basic idea is as follows. Let £™ denote the complex vector space C™ with the tp norm. Consider 
the map S : £^ ~> E* that takes the j'th coordinate vector to Vj. Let N{S) denote the number of 
balls in E* needed to cover the image (under the map S) of the unit ball in £™. We can bound N{S) 
using Maurey's empirical method. Also define the dual map S* : E —i' ^™ , and the associated dual 
covering number 7V(S'*). Then N{Be,\\-\\x , e) is related to 7V(S'*). Finally, iV(S') andiV(S'*) are 
related via entropy duality inequalities. See lfT6l for details. 

We will apply this lemma as follows, using the same approach as ITtII . Let Sp denote the Banach 
space consisting of all matrices in C''^'* with the Schatten p-norm. Intuitively, we want to set 
E = Si and E* = Soo, but this won't work because A(S'i) is infinite. Instead, we let E = Sp, 
p = (log(i)/(logrf - 1), and E* ^ Sq, q = logd. Note that \\AI\\p < ||M||*, hence Bi C Bp and 

elog'/^ N{Bi, \\-\\x,s) < elog^''' N {Bp, \\-\\x.e). (25) 

Also, we have \{E) < 1/^/^^ = Vlog d - 1 and T2{E*) < X{E) < y/logd~^ (see the 
Appendix in [17J). Note that < e||Af ||, thus we have \\Vj\\q < eK (for all j). Then, using 

the lemma, we have 

£\og^l^N{BpMx,e) < C2log3/'d (e7^)logi/2m, (26) 
which proves the claim. 

4 Outlook 

We have showed that random Pauli measurements obey the restricted isometry property (RIP), which 
implies strong error bounds for low-rank matrix recovery. The key technical tool was a bound on 
covering numbers of the nuclear norm ball, due to Guedon et al IIT6I . 

An interesting question is whether this method can be applied to other problems, such as matrix com- 
pletion, or constructing embeddings of low-dimensional manifolds into linear spaces with slightly 
higher dimension. For matrix completion, one can compare with the work of Negahban and Wain- 
wright 1 10|, where the sampling operator satisfies restricted strong convexity (RSC) over a certain set 
of "non-spiky" low-rank matrices. For manifold embeddings, one could try to generalize the results 
of |[24l . which use the sparse-vector RIP to construct Johnson-Lindenstrauss metric embeddings. 

There are also many questions pertaining to low-rank quantum state tomography. For example, 
how does the matrix Lasso compare to the traditional approach using maximum likelihood estima- 
tion? Also, there are several variations on the basic tomography problem, and alternative notions of 
sparsity (e.g., elementwise sparsity in a known basis) 1,25 J , which have not been fully explored. 

Acknowledgements: Thanks to David Gross, Yaniv Plan, Emmanuel Candes, Stephen Jordan, and 
the anonymous reviewers, for helpful suggestions. Parts of this work were done at the University 
of California, Berkeley, and supported by NIST grant number 60NANB10D262. This paper is 
a contribution of the National Institute of Standards and Technology, and is not subject to U.S. 
copyright. 
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Universal low-rank matrix recovery from Pauli measurements: 



Supplementary material 

A Proof of the RIP for Pauli Measurements 
A.l Overview 



We now prove Theorem l2.1l In this section we give an overview; proofs of the technical claims are 
deferred to later sections. The general approach involving Dudley's entropy bound is similar to L12J . 
while the technical part of the proof (bounding certain covering numbers) uses ideas from ||T6) . 

Recall the definition of the restricted isometry property, with constant < S < 1. Let 

U = {X eC^"""^ \\\X\\, <V^\\X\\f}. (27) 

Let us define 

U2^{Xe C'"' I \\X\\f < 1, ||X||, < V^\\X\\f}, (28) 

eriA)= sup \{X,{A*A-I)X)\. (29) 
xei/2 

Also, define e = 26 — 5'^. We claim that, to show RIP, it suffices to show er{A) < e. To see this, 
note that the RIP condition is equivalent to the statement 

forallXeC/, {l-Sf{X,X)<{X,A*AX)<{l + Sf{X,X), (30) 

which is equivalent to 

forallXeC/, {~26 + S^){X,X)<iX,{A*A-I)X)<{2S + 6'^){X,X), (31) 
which is implied by 

forallXeJ/s, \iX,{A*A-I)X)\<-aim{26 + S^,26-S^}^2S-S^. (32) 

Thus our goal is to show er{A) < e. (Note that for 5 in the range [0,1], we have that e > 5.) 

Let B be the set of all self-adjoint linear operators from C''^'' to C'^^'^, and define the following 
norm on B: 

\\M\\^r) - sup \{X,MX)\. (33) 

Suppose that r >2 (this will suffice for our purposes, since RIP with r = 2 implies RIP with r = 1). 
We claim that || • || (r) is a norm, and that S is a Banach space with respect to this norm. 

To show these claims, we will consider the Frobenius norm 1 1 • 1 1 i? on S, which is defined by viewing 
each element of ,B as a "matrix" acting on "vectors" that are elements of C^^"^. Then we will bound 
II 'll (r) in terms of || - Hi?. More precisely, let fTa (a e {0, 1, . . . , d — 1}) be the standard basis vectors 
in C^, and let Eab — Saf^ (a, b E {0, 1, . . . ,d — 1}) be the standard basis vectors in C^^"^. Then 
the Frobenius norm on B can be written as 



2\ 1/2 

(34) 



abed 

We claim that, for all M e B, 

'''' 

To see this, suppose that ||7M||f > then there must exist a, b,c,d e {0, 1, . . . ,d — 1} such 
that I (i?cd|-A^|-Eab)| > Jj/i. If Eab — Ecd, then we have ||A^||(r) > ^/i. Otherwise, we have 
{Eab\Ecd) = 0. Now at least one of the following must be true: 

\Re{E,d\M\Eab)\>^fi (easel), (36) 
\lm{E,d\M\Eab)\>^fi (case2). (37) 
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In case 1, let X — -^=(Eab + Ed), and write 

Re{E,d\M\Eab) = {X\M\X) - l{Eab\M\Eab) - ^{E,d\M\E,d). (38) 
One of the three terms on the right hand side must have absolute value at least ^^^^ /x. Since 
X, Eab, Ecd are in U2, it follows that || 
write 



(r) > In case 2, let X = -^{Eab + iEcd), and 



lm{E,d\M\Eab) = i{X\M\X) - ^i{Eab\M\Eab) - ^i{Ecd\M\E,d) 



By a similar argument, we get that || || (,,) 
In addition, it is straightforward to see that 



> 



/i. This shows ( [35l l. 



M||(,)< sup \{X,MX)\<\\M\\op<\\M\\f 



(39) 



(40) 



X : ||X||f<1 



Finally, using (|35] | and (l40l i. we see that ||-||(r) is a norm, and S is a Banach space with respect 
to ||-||(r)- (This follows since these same properties already hold for ||-||_f.) In particular, ||-||(.r) is 
nondegenerate (|| A^|| = implies M = 0), and B is complete with respect to || -H (r)- 

Returning to our main proof, we can now write £r{A) — \\A* A — I\\(r)- The strategy of the proof 
will be to first bound Ee,.(yl), then show that er{A) is concentrated around its mean. 

We claim that 

Eer{A)<2E\\Y,e,\S,){S,\^ (41) 

where the Sj are Rademacher (iid ±1) random variables. Here the round ket notation | Sj) means we 

,2 

view the matrix 5^ as an element of the vector space C with Hilbert-Schmidt inner product; the 
round bra (5j | denotes the adjoint element in the (dual) vector space. The above bound follows from 
a standard symmetrization argument: write A*A — X = X^Jli '^j where Xj — 1 5j) (5^ \^ — 
then let A'j be independent copies of the random variables Xj, and use equation (2.5) and Lemma 
6.3 in [22 J to write: 



ESriA) = E 

< E 
= E 

< 2E 



i 



(r) 



('■) 



= E\\J2eAX,^x;)\\ 



(r) 



(42) 



3 



(r) 



Now we use the following lemma, which we will prove later This bounds the expected magnitude 
in (r)-norm of a Rademacher sum of a fixed collection of operators Vi,. . . ,Vm that have small 
operator norm. 



••dx d 



that have 



Lemma A.l (restatement of Lemma 13.71 ) Let m < <P. Fix some Vi, . . . , Vm G 
uniformly bounded operator norm, \\Vi\\ < K (for all i). Let ei, . . . , Em be iid uniform ± 1 random 
variables. Then _ 

1/2 



EJ\J2£r\v^{v:,\ 



i=l 



(r) 



i=l 



(r) 



(43) 



where C5 = y/r ■ C4K log^^^ d log^^^ m and C4 is some universal constant. 



We apply the lemma as follows. Let il = {Si, ... , Sm} be the multiset of all the measurement 
operators that appear in the sampling operator A. Then we have 



\^^ejVd\j){.j\Vd 
Jen 



(r) 



(44) 
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Using the lemma on the set of operators ^/dJ ( J e Q), we get 



ler{A) < 2EnC5 ■ 11 ^ Vd| J) 



Jen 



<2(En\\J2 Vd\j){j\Vd -C-i 



Jen 



1/2 
(r) 
1/2 



2[E\\A*A\\(^r) 



1/2 



(r) 
d 



(45) 



<2(Ee,(yi) + 1)1/2.^5 
where C^^ ^/r ■ C^K log^ d. 

To make the notation more concise, define i?o — Eeyi^A) and Co = 2C^^J~^. Then, squaring both 
sides and rearranging, we have 

^0 ^ C'o^o — C'o ^ 0. 



(46) 



This quadratic equation has two roots, which are given by a± 
know that Eq is bounded by 

a- < < So < a+. 



\{Cl ± Co^/Cl + l\ and we 

(47) 



Also, we can simphfy the bound by writing a+ < ^(Co + Co (Co + 2)) = Cq + Co- 

Now we use the fact that m is large. Let A > 1 (we will choose a precise value for A later). Assume 
that 



m > \d{2C5Y ^X-4:Cl-dr-K^ log^ d. 
Then Co < 1/ a/A, and aj^ — J ^ have the desired result: 



Ee.(^)<i + 7I- 



(48) 



(49) 



It remains to show that er{A) is concentrated around its expectation. We will use a concentration 
inequality from ll22l for sums of independent symmetric random variables that take values in some 
Banach space. Define X = X^jli where Xj = ^\ Sj) (Sj | — ^; then we have A* A - I = X 

and er{A) — \\X\\(^r)- 

We showed above that EjjA'llfr) < i + -7=. In addition, we can bound each Xj as follows, using 
the fact that, for X e U2, \{Sj,X)\ < ||S'j|| < {K/Vd)^\\X\\F < {K/Vd)y/¥. 



\\X 



jll(r) = sup 

xeU2 



■\is,,x)\' 



drK"^ + 1 
< < 



A • 4C| ■ 



(50) 



We use a standard symmetrization argument: let X'^ denote an independent copy of Xj , and define 
yj = Xj — Xj, which is symmetric (— has the same distribution as J^j). Also define y = 
Sjli yj ~ X — X' . Using the triangle inequality, we have 



E||3;||(,)<2E||A'||(,)<2(i + ^), 

1 



< 



Using equation (6.1) in 



m\(r)<2\\X,\\^r,^J^. 

we have, for any u > 0, 



Pr 



'^llw>2(i + ;^) 



< Pr 



X\\(^r) > 2K\\X\ 



(r) 



< 2Pr 



l|3^ll(r) >U 



(51) 
(52) 

(53) 



We will use the following concentration inequality of Ledoux and Talagrand 11221 . This is a special 
c ase of Theorem 6 . 1 7 in l.22j , where we set s = i?^ and use equation (6 . 1 9) in ll22l . This is the same 
bound used in lfT2l . 
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Theorem A.2 Let J^i , . . . , 3^,„ be independent symmetric random variables taking values in some 
Banach space. Assume that \\yj || < -R for all j. Let y = X^j^i -^j- Then, for any integers i > q, 
and any t > 0, 



Pr 



13^11 > SqE\\y\\ + 2R£ + tE\\y\\ < {Cj/qY + 2 exp(-iV256g), 



(54) 



where C7 is some universal constant. 



Now set q — [eCy] . Introduce a new parameter s > ^ + 1, and set i = [s^J and t = s. We get 
that the failure probability is exponentially small in s: 



Pr 



|3^||(r) > i8q + s)E\\y\\i^,^+2Rs' 



Then, using iSTj . ( |52] l and ( |53] l, we get 



Pr 



II A-IIm >{l + 8q + s)- 2(1 + ^) + jL^s'] < 2[e-^+i + 2e'^'/''''^] 



(55) 



(56) 



Now let A > (1 + 8(7)^ • ^ (note that A > 1, as required). Then set s = (note that s > 
I + 8q > ^/q + 1, as required). Then we can write 



(l + 8g + .).2(i + -i=) + 3^.2<^ 



< e. 



(57) 



'4 \/A ' CjA 2 ^ 256CJ 

Plugging into the previous inequality, we have 

PrlllA-llf,) > e] < e-"(^') = e-"("'^). (58) 

Therefore, we have er{A) < e, with a failure probability that decreases exponentially in A. This 
completes the proof. 

A.2 Proof of Lemma f3A\ (bounding a Rademacher sum in (r)-norm) 

Let Lq = lEelEilli 1^0 I II (j")' '■^^^ '■^^ quantity we want to bound. We can upper-bound 
it by replacing the ±1 random variables ei, . . . , with iid A^(0, 1) Gaussian random variables 
gi, . . . ,gm (see Lemma 4.5 and equation (4.8) in 1221 '): then we get 



Lo<lKg 



i=l 



(r) 



Using the definition of the norm || • || (r) (equation (l33Tl). we have 

ni 

Lo < Eg sup J^\GiX)\, G{X) = ^5,|(y„X)|^ 



xeu-i 



(59) 



(60) 



The random variables G{X) (indexed by X G U2) form a Gaussian process, and Lq is upper- 
bounded by the expected supremum of this process. In particular, using the fact that G{0) — and 
G(-) is symmetric (see |22|, pp.298), we have 



Lo < \ Wg sup \G{X) - G(0)| < ./f Eg sup \G{X) - G{Y)\ 
^ xeU2 * x,YeU2 



(61) 



f Eg sup G{X) - G{Y) = V27rEg sup G{X). 
x,YeU2 xeU2 



Using Dudley's inequality (Theorem 11.17 in [22]), we have 

POO 

Jo 

where N{U2,dQ, e) is a covering number (the number of balls in C''^'' of radius e in the metric da 
that are needed to cover the set U2), and the metric do is given by 



(62) 



dG{X,Y)=[n{G{X)-G{Y)f]) 



1/2 



(63) 
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We can simplify the metric do, using the fact that ¥\gigj\ = 1 when i — j and otherwise: 

, 2n X 1/2 



dG{X,Y) = (E[(j2g,m,X)\' -m,Y)\^)) ]) 

711 



2\ 1/2 



(64) 



Define a new norm (actually a semi-norm) 1 1 • 1 1 x on as follows: 

||M|U= max m,M)\. 

■i— l,...,m 

Note thatQ 

m,x)\' - m,Y)\'\ < {m,x)\ + m,Y)\) ■ m,x) - iv,,Y)\ 
<[m.x)\ + m,Y)\)-\\x-Y\\x. 

This lets us give a simpler upper bound on the metric da- 

, 1/2 



(65) 



(66) 



iG{x,Y) < (j2{m,x)\ + m,Y)\) -wx-Yi 



.X) 



m 

<2 sup (y2m,x 



1/2 



4=1 

1/2 



1/2 



IX -ri, 



(67) 



\X ~Y\ 



X 



2\\J2\v;){v,\ 



1 1/2 
l(r) 



\X -Y\ 



Note that the last step holds for all X,Y e U2- To simplify the notation, let R = 
il^^) then we have daiX, Y) < 2R\\X - Y\\x. 

This lets us upper-bound the covering numbers in do with covering numbers in 1 1 • 1 1 x : 

N{U2,dG,e) < N{U2, IMIx, ^) = Ni^U2, \\-\\x, (68) 
Plugging into (l62l t and changing variables, we get 

Lo<A8V2^RV^ log^/^N{^U2,\\-\\x,e)de. (69) 
Jo 

We will now bound these covering numbers. First, we introduce some notation: let || Hp denote the 
Schatten p-norm on C^^"^, and let Bp be the unit ball in this norm. Also, let Bx be the unit ball in 
the II • II A' norm. 



Observe that 



-^U2^BiCK-Bx. 



(70) 



(The second inclusion follows because ||M||x < maxi=i^....„j|| Vi|| ||Af||* < ii'llAfll*.) This gives 
a simple bound on the covering numbers: 

N{^U2,\\-\\x,e) < NiB,,\\-\\x,e) < NiK ■ Bx,\\-\\x,e). (71) 

This equals 1 when e > K. So, in equation ( |69] l, we can restrict the integral to the interval [0, -fC]. 

When e is small, we will use the following simple bound (equation (5.7) in ||231 ): (this is equation 
(EOJ) 

N{K-Bx,Hx,e)<{l + ^f''\ (72) 



''Note that, for any complex numbers a and b, |ap — |6p — i(a+6)(a— &) + i(a+6)(a— 6) < |a+b|-|a — 6|. 
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When e is large, we will use a more sophisticated bound based on Maurey's empirical method and 
entropy duality, which is due to lfT6l (see also ifTTl ): (this is equation (ISTT i) 

A^(i?i, ||-||a',£) < exp( '"^.^ log'^rflogm), for some constant Ci . (73) 
We defer the proof of (ISTT l to the next section. Here, we proceed to bound the integral in 
Let A = K/d. For the integral over [0, A], we write 

Li := riog'/'N{^U2,\\-\\x,e)de< V2d\og'^^il + ^)de 
Jo ^ Jo 

< %/2d / {l + \og{l + ^))de^V2d-A + V2d-L[ 
JO ^ 



where 



L[ / log(l + ^)de = / log(l + 2Xy)^ 

Jo Jl/A 

pQC />00 pOO 

< \og{{A + 2K)y)^= \og{A + 2K)^+ logy^. 

Jl/A Jl/A Jl/A 



L2--^J^ W^N{^U2Mx,e)de< ^Xog'^'UXog^/^ m de 
= Ci/^log^/^dlogi/^mlogf = Cii^log^/^dlog^/^m. 



(74) 



(75) 



Integrating by parts, we get 

i'l < Alog(v4 + 2K) + A\og\+ A = Alog(l + 2^) + A, (76) 
and substituting back in, 

Li < V2dA{2 + log(l + ^)) = V2K{2 + log(l + 2d)). (77) 

For the integral over [A, K], we write 



(78) 



Finally, substituting into ( |69l l, we get 

Lo < A8V2^Ry/?{Li + La) < dRy/^K log^/^ d log^/^ m, (79) 
where C4 is some universal constant. This proves the lemma. 

A.3 Proof of Equation ( l2ll (covering numbers of the nuclear-norm ball) 

Our result will follow easily from a bound on covering numbers introduced in lfT6l (where it appears 
as Lemma 1): 

Lemma A.3 Let E be a Banach space, having modulus of convexity of power type 2 with constant 
X{E). Let E* be the dual space, and let T2{E*) denote its type 2 constant. Let Be denote the unit 
ball in E. 

Let Vi , . . . , Vm € E* , such that \\Vj\\E* l£ K (for all j). Define the norm on E, 

\\M\\x^ max \iV,,M)\. (80) 

J — 1,. . . ,m 

Then, for any e > 0, 

e\og'^^ N{BE,Hx,e) < C2X{E)^T2{E*)K\og'/^ m, (81) 
where C2 is some universal constant. 
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The proof uses entropy duality to reduce the problem to bounding the "dual" covering number The 
basic idea is as follows. Let denote the complex vector space C™ with the Ip norm. Consider 
the map S : £™ — > E* that takes the j'th coordinate vector to Vj. Let N{S) denote the number of 
balls in E* needed to cover the image (under the map S) of the unit ball in We can bound N{S) 
using Maurey's empirical method. Also define the dual map S* : E ^ ^™ , and the associated dual 
covering number N{S*). Then N{Be, \\-\\x,e) is related to N{S*). Finally, N{S) and N{S*) are 
related via entropy duality inequalities. See 1 16 | for details. 

We will apply this lemma as follows, using the same approach as ITTll . Let Sp denote the Banach 
space consisting of all matrices in C''^'' with the Schatten p-norm. Intuitively, we want to set 
E — Si and E* = Soo, but this won't work because \{Si) is infinite. Instead, we let E — Sp, 
p = (logd)/(logd - 1), and E* ^ Sq,q^ \ogd. Note that ||M||p < ||A/||*, hence Bi C Bp and 

elogi/2 7V(Bi, \\-\\x,e) < elog'^^ N{Bp, \\-\\x,e). (82) 

Also, we have X{E) < l/Vp~~T = ^/logcT^ and T2iE*) < X{E) < Vlogd~T (see the 
Appendix in lH?!)- Note that ||M||, < e||M||, thus we have \\Vj\\q < eK (for all j). Then, using 
the lemma, we have 

elog'/^NiBp,\\-\\x,e) < C2 log^/' d (e/C) log^/' m, (83) 
which proves the claim. 

B Proof of Proposition 12.41 (recovery of a full-rank matrix) 

In this section we will sketch the proof of Proposition 123] We use the same argument as Theorem 
2.8 in |5 1, adapted for Pauli (rather than Gaussian) measurements. 

A crucial ingredient is the NNQ ("nuclear norm quotient") property of a sampling operator A, which 
was introduced in |5| and is analogous to the LQ ("^i -quotient") property in compressed sensing 
||26j . We say that a sampling operator A : C^^"^ C™ satisfies the NNQ(a) property if 

A{Bi)D aB2, (84) 

where Bi is the unit ball of the nuclear norm in C^^"^, and B2 is the unit ball of the £2 (Euclidean) 
norm in C™ . 

It is easy to see that the Pauli sampling operator A defined in (O satisfies NNQ(q;) with a — ^ d/m. 
(Without loss of generality, suppose that the Pauli matrices 6*1 , ... , Sm used to construct A are 
all distinct. Let a — ^ d/m and choose any y G ai?2- Let X ~ X^I^i Vi^i^ so we have 
A{X) = y. Observe that < Vd\\X\\F = y^Wyh < 1, as desired.) We remark that this 

value of a is probably not optimal; if one could prove that A satisfies NNQ(q;) with larger a, it 
would improve the bound in Proposition 123] 

We will need one more property of A. We want the following to hold: for any fixed matrix M e 
(£dxd (wjjicjj js jjot necessarily low-rank), almost all random choices of A will satisfy 

\\A{M)\\l < l.^WMfp. (85) 

(Note that this inequality is required to hold only for this one particular matrix M.) In our case 
(random Pauli measurements), it is easy to check that A obeys this property as well. 

The proof of Theorem 2.8 in f5\ actually implies the following more general statement, about low- 
rank matrix recovery when A satisfies both RIP and NNQ: 

Theorem B.l Let M be any matrix in C^""^, and let cti(M) > 0-2 (A^) > • • • > cr<i(M) > be its 
singular values. Write M = Mr + Mc, where Mr contains the r largest singular values of M. Also 
write M = Mq + Afg, where Mq contains only those singular values of M that exceed X = 16\/da. 

Suppose the sampling operator A : C''^'^ — )■ C™ satisfies RIP (for rank-r matrices in C^^"^), and 
NNQ(a) with a = fiy^d/m. Furthermore, suppose that A satisfies \\A{Mc)\\2 < l-5||A^c|||' (^nd 
\\A{MMl < 1.5p/e|||. 
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Say we observe y — A{M) + z, where z ^ N{0,a'^I). Let M be the Dantzig selector d?]) with 
A = 16\/da, or the Lasso (O with fj, — "ilsfda. Then, with high probability over the choice of A 
and the noise z, 

\\M - M\\l < Co ^min(af (Af),da2) + (Ci + ^) ^ ^^(M), (86) 
where Cq, Ci and C2 are absolute constants. 

To prove Theorem IB. II one follows the proof of Theorem 2.8 in ||5|. There is a sUght modification 
to Lemma 3.10 in |5|: one gets the more general bound, 

II - Af llf^ < CoXV^+ (Ci + f^^)\\A{M,)h + \\M,\\f. (87) 
Combining Theorem lB. 1 1 with the preceding facts gives us Proposition |23] 



17 



